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Abstract 

In terms of transfer entropy, we investigate the strength and the direction of informa- 
tion transfer in the US stock market. Through the directionahty of the information 
transfer, the more influential company between the correlated ones can be found 
and also the market leading companies are selected. Our entropy analysis shows 
that the companies related with energy industries such as oil, gas, and electricity 
influence the whole market. 
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1 Introduction 



Recently, economy has become an active research area for physicists. They 
have investigated stock markets using statistical tools, such as the correlation 
function, multifractal, spin-glass models, and complex networks [1,2,3,4,5,6]. 
As a consequence, it is now found evident that the interaction therein is highly 
nonlinear, unstable, and long-ranged. 

All those companies in the stock market are interconnected and correlated, 
and their interactions are regarded as the important internal force of the mar- 
ket. The correlation function is widely used to study the internal inference of 
the market [7,8,9,10,11]. However, the correlation function has at least two 
limitations: First, it measures only linear relations, although a linear model is 
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not a faithful representation of the real interactions in general. Second, all it 
says is only that two series move together, and not that which affects which: in 
other words, it lacks directional information. Therefore participants located 
in hubs are always left open to ambiguity: they can be either the most in- 
fluential ones or the weakest ones subject to the market trend all along. It 
shoTild be noted that introducing time-delay can be a good remedy for these 
limitations. Some authors use such concepts as time-delayed correlation and 
time-delayed mutual information, and these quantities construct asymmetric 
matrices by preserving directionality [9,12]. In case that the length of delay can 
be appropriately determined, one can also measure the 'velocity' whereby the 
influence spreads. In this paper, however, we rely on a newly-devised variant 
of information to check its applicability. 

Information is an important keyword in analyzing the market or in estimating 
the stock price of a given company. It is quantifled in rigorous mathematical 
terms [13], and the mutual information, for example, appears as meaningful 
choice replacing a simple linear correlation even though it still docs not specify 
the direction. The directionality, however, is required to discriminate the more 
influential one between correlated participants, and can be detected by the 
transfer entropy (TE) [14]. 

This concept of TE has been already applied to the analysis of financial time 
series by Marschinski and Kantz [15]. They calculated the information flow 
between the Dow Jones and DAX stock indexes and obtained conclusions 
consistent with empirical observations. While they examined interactions be- 
tween two huge markets, we may construct its internal structure among all 
participants. 



2 Theoretical Background 



Let us consider two processes, / and J. Transfer entropy [14] from J to / is 
deflned as follows: 



^J^I = }_^p{tt+l,ll \ ft') log -T^y— (1) 

p{h+m ) 



where it and jt represent the states at time t of / and J, respectively.. In terms 
of relative entropy, it can be rephrased as the distance from the assumption 
that J has no influence on / (i.e. p{it+i\il''\ jf^) = One may 
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rewrite Eq. (1) as: 



T 



hi{k;t) - hij{k,l;t), 



(2) 



from the property of conditional entropy. Then the second equahty shows that 
TE measures the change of entropy rate with knowledge of the process J. Eq. 
(2) is practically useful, since the TE is decomposed into entropy terms and 
there has been already well developed technique in entropy estimation. 

There are two choices in estimating entropy of a given time series. First, the 
symbolic encoding method divides the range of the given dataset into disjoint 
intervals and assign one symbol to each interval. The dataset, originally contin- 
uous, becomes a discrete symbol sequence. Marschinski and Kantz [15] took 
this procedure and introduced the concept called effective transfer entropy. 
The other choice exploits the generalized correlation integral Cq. Prichard 
and Theiler [16] showed that the following holds for data i: 

Hq{i,2e)^-log,[Cq{i,e)], (3) 



where e determines the size of a box in the box-counting algorithm. We define 
the fraction of data points which lie within e of i{t) by 

B{x{t),e)^We{e-m-t{s)\), (4) 



where © is the Heaviside function, and calculate its numerical value by the 
help of the box-assisted neighbor search algorithm [17] after embedding the 
dataset into an appropriate phase space. The generalized correlation integral 
of order 1 is then given by 

logCl(^,e) = ^^log5(^(^),6). (5) 



Notice that is expressed as an averaged quantity along the trajectory 
i{t) and it implies a kind of ergodicity which converts an ensemble average 
Y^iPk^ogpk = (logpk) into a time average, logpk{t)- Temporal correlations are 
not taken into consideration since the daily data already lacks much of its 
continuity. 

It is rather straightforward to calculate entropy from a discrete dataset using 
symbolic encoding. But determining the partition remains as a serious prob- 
lem, which is referred to as the generating partition problem. Even for a two- 
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dimensional deterministic system, the partition lines may exhibit considerably 
complicated geometry [18,19] and thus should be set up with all extreme cau- 
tion [20] . Hence the correlation integral method is often recommended if one 
wants to handle continuous datasets without over-simplification, and we will 
take this route. In addition, one has to determine the parameter e. In a sense, 
this parameter plays a role of defining the resolution or the scale of concerns, 
just as the number of symbols does in the symbolic encoding method. 

Before discussing how to set e, we remark on the finite samphng effect: Though 
it is pointed out that the case of g = 2 does not suffer much from finiteness 

of the number of data [21], then the positivity of entropy is not guaranteed 
instead [14]. Thus we choose the conventional Shannon entropy, q = 1 through- 
out this paper. There have been works done [22,23,24] on correcting entropy 
estimation. These correction methods, however, can be problematic when cal- 
culating TE, since the fiuctuations in each term of Eq. (2) are not independent 
and should not be treated separately [25]. We actually found that a proper 
selection of e is quite crucial, and decided to inactivate the correction terms 
here. 

A good value of e will discriminate a real effect from zero. Without a priori 
knowledge, we need to scan the range of e in order to find a proper resolution 
which yields meaningful results from a time series. For reducing the compu- 
tational time, however, we resort to the empirical observation that an airline 
company is quite dependent on the oil price while the dependency hardly ap- 
pears in the opposite direction. Fig. 1 shows this unilateral effect: the major 
oil companies. Chevron and Exxon Mobile, have influence over Delta Airline. 
From € ~ 0.002 which maximizes the difference between two directions (Fig. 
2), we choose the appropriate scale for analyzing the data. Even in observing 
the temporal evolution, this value gives good discrimination through the whole 
period. In Fig. 1, the influence seems reversed on very small length scales. The 
TE, however, is known to increase monotonically under reflnement of the par- 
titions in many cases [25] and the refined partition means the small length scale 
which is covered by the small e in the correlation integral method. Hence we 
regard this reversal as a finite sample effect in this paper, but it seems worth 
looking further into the characteristics of TE analysis. And we set k — I — 1 
in Eq. (1) since other values does not make significant differences. 



3 Data Analysis 

This study deals with the daily closure prices of 135 stocks listed on New York 
Stock Exchange (NYSE) from 1983 to 2003 (L ~ 5, 000 trading days. At = 1 
trading day), obtained through the website [26]. We select stocks which is 
listed on NYSE over the whole periods. The companies in a stock market 
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are usually grouped into business sectors or industry categories, and our data 
contain 9 business sectors (Basic Materials, Utilities, Healthcare, Services, 
Consumer Goods, Financial, Industrial Goods, Conglomerates, Technology) 
and 69 industry categories. The following method shows how the information 
flows between the groups: 

Suppose that we have a time series data {p{t)}, representing the daily closure 
price of a company at time t. A stock market analysis usually prefers treating 
the log return value: 

^(^2;ti) = log(^) (6) 



to the original price itself, since it satisfies the additive property: J2k=o ^(^fc+i5 tk) — 
i{tis[',tQ). This log return transformation also make the result invariant under 
the arbitrary scaling of the input data. Therefore, in order to measure the 
information transfer between two companies, say / and J, we create the log 
return time series {i{t)} and {j{t)} from the raw price data. Then one can cal- 
culate the transfer entropies Tj^j and Tj^j between them from the equalities 
in the Section 2. 

For obtaining an overview of the market, we consider groups of similar com- 
panies. Let / be a company of the group A. and ,/ be one of the group B. The 
information flow index between these two groups is defined as a simple sum: 

Pa-.b^ETi^J- (7) 
IJ 



In addition, we define the net information flow index to measure the disparity 
in influences of the two groups as: 

(^AB = Pa-^b — Pb-^a- (8) 



If cr^B is positive, we can say that the category A influences to the category 
B. 

We examine the market with two grouping methods. One is business sector, 
and the other is industry category. Grouping into business sectors, however, 
does not exhibit clear directionahty: the influence of the A sector just alter- 
nates from that of the B sector. In other words, the difference between a^^B 
and cr6_4 over the whole period is almost (zero). This unclarity comes from 
the fact that a business sector contains so many diverse companies that its 
directionality just cancels out. On the other hand, if we construct the asset 
tree through the minimum spanning tree, each business sector forms a subset 
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of the asset tree and the subsets are connected mainly through the hub. Then, 
it can be said that each of the business sectors forms a cluster [8] and there 
are no significant direct links among them. 

Hence we employ the industry category grouping, more detailed than the busi- 
ness sectors. We have to exclude the categories which contain only one element, 
and Table 1 lists the remaining industry categories used in the analysis. 

As in our previous observation, it is verified again that oil companies and air- 
line companies are related in a unilateral way: The category 20, Major Oil & 
Gas, has continuing influence over the category 19, Major Airline, during the 
whole 14 periods under examination (o"2o,i9 > 0) . One can easily find such 
relations in other categories: for example, the category 20 always influences on 
the categories 15 (Independent Oil&Gas), 22 (Oil&Gas Equipment&Services) , 
and 23 (Oil&Gas Refining&Marketing) . It also affects the category 27 (Re- 
gional Airlines) over 13 periods and maintains its power on the whole market 
during 11 periods (Fig. 3). 

It is well-known that economy greatly depends on the energy supply and 
price such as oil and gas. Transfer entropy analysis quantitatively proves this 
empirical fact. The top three influential categories (in terms of periods) are 
the categories 10 (Diversified Utilities), 12 (Electric Utilities) and 20. All of 
ten companies in the categories 10 and 12 are again related to the energy 
industry, such as those for holding, energy delivery, generation, transmission, 
distribution, and supply of electricity. 

On the contrary, an airline company is sensitive to the tone of the market. 
These companies receive information from other categories almost all the time 
(category 19: 11 periods, category 27: 12 periods). The category 8 (Credit 
Services) and the category 9 (Diversified Computer Systems, including only 
HP and IBM in our data) are also market-sensitive as easily expected. 



4 Conclusion 

We calculated the transfer entropy with the daily data of the US market. 
The concept of transfer entropy provides a quantitative value of general cor- 
relation and the direction of information. Thus it reveals how the information 
flows among companies or groups of companies, and discriminates the market- 
leading companies from the market-sensitive ones. As commonly known, the 
energy such as natural resources and electricity is shown to greatly affect eco- 
nomic activities and the business barometer. This analysis may be applied to 
predicting the stock price of a company influenced by other ones. 
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In short, TE proves its possibility as a promising measure to detect directional 
information. We suggest that the merits and demerits of TE should be judged 
in details with respect to those of the classical methods like the correlation 
matrix theory. 
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Fig. 1. Transfer entropy as function of e between (a) Delta Airlines (DAL) and 
Chevron (CVX) and (b) DAL and Exxon Mobil (XOM). 



(a) 
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Fig. 2. Net information flow index, a, between (a)DAL and CVX and (b)DAL and 
XOM 
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Table 1 



Industry category index in alphabetic order 



# 


industry category 


1 


Aerospace /Defense 


2 


Auto Manufacturers 


3 


Beverages 


4 


Business Equipment 


5 


Chemicals 


6 


Communication Equipment 


7 


Conglomerates 


8 


Credit Services 


9 


Diversified Computer Systems 


10 


Diversified Utilities 


11 


Drug Manufacturers 


12 


Electric Utilities 


13 


Farm&Construction Machinery 


14 


Health Care Plans 


15 


Independent Oil&Gas 


16 


Industrial Metals&Minerals 


17 


Information Technology Services 


18 


Lumber, Wood Production 


19 


Major Airlines 


20 


Major Oil&Gas 


21 


Medical Instruments&: Supplies 


22 


Oil&Gas Equipment&Services 


23 


Oil&Gas Refining&Marketing 


24 


Personal Products 


25 


Processing Systems&Products 


26 


Railroads 


27 


Regional Airlines 


28 


Specialty Chemicals 


29 


Etc. 
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Fig. 3. aJ^B over the whole 14 periods. The degree of darkness represents the number 
of periods when A is affected by B, and cr^'s are left blank. For example, the 
category 10 affects almost all the other categories and is affected by the categories 
12 and 25 in a few periods. The row of a market-leading category is bright on the 
average, while that of a market-sensitive one is dark. 
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